Building Large ROLAP Data Cubes in Parallel1
نویسندگان
چکیده
The pre-computation of data cubes is critical to improving the response time of On-Line Analytical Processing (OLAP) systems and can be instrumental in accelerating datamining tasks in large data warehouses. However, as the size of data warehouses grows, the time it takes to perform this pre-computation becomes a significant performance bottleneck. This paper presents a fast parallel method for generating ROLAP data cubes on a shared-nothing multiprocessor based on a novel optimized data partitioning technique. Since no shared disk is required, this method can be applied on highly scalable processor clusters consisting of standard PCs with local disks, connected via a data switch. The approach taken, which uses a ROLAP representation of the data cube, is well suited to large data warehouses on high dimensional data, and supports the generation of both fully materialized and partially materialized cubes. In comparison with previous approaches, our new method does significantly improve the scalability with respect to both, the number of processors and the I/O bandwidth (number of parallel disks). We have implemented our new parallel shared-nothing data cube generation method and evaluated it on a PC cluster, exploring relative speedup, scaleup, sizeup, output sizes and data skew. For a fact table with 16 million rows and 8 attributes, our parallel data cube generation method achieves close to optimal speedup for as many as 32 processors, generating a full data cube in under 7 minutes. For a fact table with 256 million rows and 8 attributes, our parallel method achieves optimal speedup for 32 processors, generating a full data cube consisting of ≈ 7 billion rows (200 Gigabytes) in under 88 minutes. 1 Research partially supported by the Natural Sciences and Engineering Research Council of Canada. 2 Corresponding author.
منابع مشابه
Improved Data Partitioning for Building Large ROLAP Data Cubes in Parallel
The pre-computation of data cubes is critical to improving the response time of On-Line Analytical Processing (OLAP) systems and can be instrumental in accelerating data mining tasks in large data warehouses. However, as the size of data warehouses grows, the time it takes to perform this pre-computation becomes a significant performance bottleneck. This paper presents an improved parallel meth...
متن کاملParallel Multi-Dimensional ROLAP Indexing
This paper addresses the query performance issue for Relational OLAP (ROLAP) datacubes. We present a distributed multi-dimensional ROLAP indexing scheme which is practical to implement, requires only a small communication volume, and is fully adapted to distributed disks. Our solution is efficient for spatial searches in high dimensions and scalable in terms of data sizes, dimensions, and numbe...
متن کاملParallel Multi-Dimensional RolaP Indexing1
This article addresses the query performance issue for Relational OLAP (ROLAP) datacubes. We present RCUBE, a distributed multidimensional ROLAP indexing scheme which is practical to implement, requires only a small communication volume, and is fully adapted to distributed disks. Our solution is efficient for spatial searches in high dimensions and scalable in terms of data sizes, dimensions, a...
متن کاملRolap and molap pdf
Data Warehousing and OLAP: MOLAP and ROLAP dr. Toon Calders t.calderstue.nl. Data cubes as a.dimensional OLAP MOLAP approach, but also a relational OLAP ROLAP solution. Relational OLAP ROLAP using SAS OLAP Server and Teradata.Describes the advantages and the disadvantages of the different OLAP technologies: MOALP, ROLAP, and HOLAP.ROLAP relational online analytical processing is an alternative ...
متن کاملBusiness Intelligence: Multidimensional Data Analysis
The relational database model is probably the most frequently used database model today. It has its strengths, but it doesn’t perform very well with complex queries and analysis of very large sets of data. As computers have grown more potent, resulting in the possibility to store very large data volumes, the need for efficient analysis and processing of such data sets has emerged. The concept o...
متن کامل